BMJ Health & Care Informatics
● BMJ
All preprints, ranked by how well they match BMJ Health & Care Informatics's content profile, based on 13 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.
Jeffrey, M.; Auyoung, E.; Pak, D.
Show abstract
ObjectiveEducating clinicians about Artificial Intelligence (AI) is an urgent need(1) as the UK General Medical Council (GMC) places liability with practitioners(2) and the EU AI Act with employers to provide appropriate training(3), but also because AI, like any tool, requires training to use safely. NHSE Capability Framework provides guidance(4), but frontline clinicians perspectives are unknown so we sought to identify their priorities. Methods and AnalysisUsing iterative interviews with residents, educators and experts we synthesised 10 contextualised AI-related problem statements. We surveyed residents and consultant-educators in the East of England, who rated their confidence and importance. Participants also ranked their preferred learning modality. ResultsWe received 299 responses. Clinicians priorities, defined by high importance (I) and low confidence (C), were: understanding liability implications (I: 40%; C: 1.82/5), determining appropriate levels of confidence in AI algorithms (I: 36.5%; C: 1.98/5), and mitigating security and privacy risks (I: 34%; C: 1.68). Confidence was low (mean 20, range 10-50), with no significant difference between educators and residents. Residents preferred integration of training into regional teaching, while consultant-educators favoured webinars. ConclusionOur findings show that clinicians prioritise practical concerns, such as liability and determining confidence in algorithmic outputs. In contrast, critical appraisal and explaining AI to patients were deprioritised, despite their relevance to clinical safety. This study enhances the NHSE Capability Framework by contextualizing AI-related capabilities for clinicians as users and identifying priorities with which to develop scalable training. Key MessagesO_ST_ABSWhat is already know on this topicC_ST_ABSWhile clinicians face legal accountability for their use of AI in healthcare(2,3,5), there remains no standardised educational pathway to support them in acquiring the necessary skills. Although expert-informed capability frameworks exist(6), they are necessarily broad and lack operational clarity for day-to-day clinical roles. What this study addsThis study translates 31 AI-related capabilities from the NHSE DART-Ed Capability Framework(6) into 10 concise AI learning needs for clinicians of the user archetype through iterative interviews with residents, educators and AI experts. A regional survey with 299 responses from residents and educators highlights practical concerns such as liability and determining appropriate confidence in AI algorithms as learners priorities, whilst critical appraisal and explaining AI to patients were deprioritised despite their relevance to clinical safety. How this study might affect research, practice or policyThe educational priorities of clinicians as users of AI identified in this study provides engaging, curriculum-ready content mapped to the user archetype of the DART-Ed framework, which can be adapted to role and task-specific educational activities.
Healy, J.; Kossoff, J.; Lee, M.; Hasford, C.
Show abstract
ObjectiveA paper from Goh et al found that a large language model (LLM) working alone outperformed American clinicians assisted by the same LLM in diagnostic reasoning tests [1]. We aimed to replicate this result in a UK setting and explore how interactions with the LLM might explain the observed gaps in performance. Methods and AnalysisThis was a within-subjects study of UK physicians. 22 participants answered structured questions on 4 clinical vignettes. For 2 cases physicians had access to an LLM via a custom-built web-application. Results were analysed using a mixed-effects model accounting for case difficulty and the variability of clinicians at baseline. Qualitative analysis involved coding of participant-LLM interaction logs and evaluating the rates of LLM use per question. ResultsPhysicians with LLM assistance scored significantly lower than the LLM alone (mean difference 21.3 percentage points, p < 0.001). Access to the LLM was associated with improved physician performance compared to using conventional resources (73.7% vs 66.3%, p = 0.001). There was significant heterogeneity in the degree of LLM-assisted improvement (SD 10.4%). Qualitative analysis revealed that only 30% of case questions were directly posed to the LLM, which suggests that under-utilisation of the LLM contributed to the observed performance gap. ConclusionWhile access to an LLM can improve diagnostic accuracy, realising the full potential of human-AI collaboration may require a focus on training clinicians to integrate these tools into their cognitive workflows and on designing systems that make these integrations the default rather than an optional extra.
Vecellio, M. I. B.
Show abstract
Primary care artificial intelligence adoption among United States (US) physicians accelerated from 38% to 66% within one year. Implementation strategies typically assume physician resistance as the primary barrier; however, emerging evidence suggests a different challenge where enthusiastic adoption precedes adequate knowledge development. Aims: To assess physician readiness for AI implementation in organized primary care, including knowledge levels, attitudes, implementation priorities, and actual usage patterns among Swiss primary care physicians. Methods: Multicentric cross-sectional survey involving four regional subnetworks as study centers (Zurich, Bern, Ticino, Romandie) within mediX Switzerland, conducted August-September 2024. The mediX network comprises 900+ primary care physicians across three Swiss linguistic regions operating within a hybrid managed care model. Online survey of 620 primary care physicians yielding 155 analyzable responses (25.8% response rate). Analysis employed Wilson Score confidence intervals for proportions, Cohens h effect sizes, and sensitivity analyses addressing both non-response bias and knowledge threshold definitions. Results: A pronounced knowledge-attitude gap emerged among respondents. While 69.0% (95% CI: 61.4%-75.8%) expressed positive attitudes toward AI and 81.9% (95% CI: 75.1%-87.2%) sought training opportunities, only 14.8% (95% CI: 10.1%- 21.3%) self-assessed their knowledge as high or excellent (levels 4-5), our primary threshold for adequate knowledge. Even when including moderate self-assessed knowledge (level 3+), only 47.1% met this threshold, indicating a persistent 21.9 percentage point knowledge-attitude gap. Critically, 27.7% (95% CI: 21.3%-35.3%) already use AI tools for clinical purposes notwithstanding acknowledged competency gaps. Non-response sensitivity analyses suggest population-level training interest ranges from 20.5% to 81.3% depending on assumptions about non-responders. Physicians demonstrated clear implementation preferences: immediate priority for administrative support (80.0%) and image analysis (73.5%), medium-term priority for medication management (64.5%) and diagnostic support (61.9%), and long-term perspective for complex applications. Conclusions: Among AI-engaged physicians, this exploratory study reveals a substantial knowledge-attitude gap and documents current AI usage patterns that may precede formal knowledge acquisition. While selection bias limits generalizability, these findings suggest that educational interventions and governance frameworks merit urgent consideration in coordinated care settings where AI adoption is accelerating. What is already known on this topicAI adoption in primary care accelerated from 38% to 66% within one year, creating urgent need for readiness assessment Current implementation strategies assume physician resistance, though evidence suggests knowledge deficits may be a greater barrier Knowledge-attitude gaps have been reported across healthcare systems, but their magnitude and implications for patient safety remain poorly understood What this study addsReveals a 54.2 percentage point knowledge-attitude gap persistent across sensitivity analyses, indicating barriers stem from education infrastructure deficits rather than fundamental resistance Identifies unsupervised AI usage by 27.7% of physicians despite acknowledged knowledge limitations--a patient safety concern absent from previous implementation literature Establishes physician-consensus implementation hierarchy enabling systematic, evidence-based AI deployment: begin with administrative applications ([≥]70% support), progress to clinical support (50-69%), reserve complex applications (<50%) for mature phases
Uzochukwu, B. S. C.; Cherima, Y. J.; Enebeli, U. U.; Okeke, C. C.; Uzochukwu, A. C.; Omoha, A.; Hassan, B.; Eronu, E. M.; Yusuf, S. M.; Uzochukwu, K. A.; Kalu, E. I.
Show abstract
Background: The integration of artificial intelligence (AI) into clinical practice holds transformative potential for healthcare in West Africa, but safe deployment requires context-appropriate governance, accountability, and post-deployment monitoring frameworks. This cross-sectional mixed-methods study examined preferences and concerns of West African clinicians and technical experts regarding AI governance structures, post-deployment surveillance mechanisms, and accountability allocation. Methods: A structured questionnaire was administered to 136 physicians affiliated with the West African College of Physicians (February 22-28, 2026), complemented by 72 key informant interviews with technical leads, AI developers, data scientists, policymakers, and healthcare leaders. Data were analyzed using descriptive statistics, inferential tests, and thematic analysis. Results: Clinicians strongly preferred independent regulatory bodies (40.4%) for overseeing AI tool performance, with high trust ratings (mean:4.3/5), while vendor self-monitoring received minimal support (3.7%, mean:2.4/5). Real-time dashboards were the most favored monitoring approach (41.9%). Clear accountability pathways (94.1%), algorithm transparency (91.9%), and real-time performance data (89.7%) were rated essential by majorities. Major concerns included clinicians being unfairly blamed for AI errors (76.5%), excessive vendor control (72.8%), and absence of clear reporting pathways (69.9%). Qualitative findings emphasized continuous performance tracking for accuracy, fairness, and bias; structured incident reporting; protocols for model drift and failure; and multi-layered governance combining independent oversight, institutional AI committees, and explicit liability frameworks. Conclusion: This study provides the first empirical evidence from West Africa on clinician preferences for AI governance. Findings offer actionable guidance for policymakers to build trustworthy, equitable, and safe AI integration frameworks that prioritize transparency, independent oversight, and clinician protection. Keywords: artificial intelligence; AI governance; post-deployment monitoring; accountability; West Africa; clinician preferences; health data science.
Pita Ferreira, P.; Soriano Longaron, S.; Bouisaghouane, W.; Goris, J.; H. Hoekman, A.; Markos, B.; Maus, B.; Pozzi, G.; Hasan, H.; Kalinauskaite, I.; Stunt, J.; D. Kist, J.; van der Elst, J.; Maguet, K.; Ziegfeld, L.; Cuypers, M.; Milota, M.; Habets, M.; Colombo, S.; Petric, S.; Groefsema, S.; Warmelink, S.; Daae, E.; Briganti, G.; Vajda, I.; Valdenegro-Toro, M.; Braun, M.; Jeekel, P.; Goosen, S.; Schepel, A.; Ester, L.; Kuzee, R.; de Klerk, S.; Lamoth, C.; Ballard, L.; Plantinga, M.
Show abstract
Artificial intelligence (AI) in healthcare holds transformative potential but risks exacerbating existing health disparities if inclusivity is not explicitly accounted for. This study addresses the disconnected discussions on inclusive medical AI by developing a comprehensive framework, PREFER-IT. This framework is based on the outcomes of a five-day transdisciplinary co-creation workshop that involved 37 experts from diverse backgrounds, including healthcare, ethics, law, social sciences, AI, and patient advocacy. For this workshop, we used design thinking and participatory methodologies to develop a framework for realising inclusive medical AI. We identified three key challenges for realising inclusive medical AI: integrating the lived experiences and stakeholder voices across the AI lifecycle, designing data collection practices that promote fairness and prevent inequalities, and fostering regulatory frameworks to uphold human rights and promote inclusivity. The analysis of participants perspectives informed the development of eight key thematic clusters of PREFER-IT: Participatory and co-design approaches (P), Representative and diverse data (R), Education and digital literacy (E), Fairness (F), Ethical and legal accountability (E), Real-world validation and feedback (R), Inclusive communication (I), and Technical interoperability (T). These elements were mapped across structural layers of AI (humans, data, system, process, and governance) and the AI lifecycle to guide inclusive design, development, validation, implementation, monitoring, and governance. This framework fosters stakeholder engagement and systemic change, positioning inclusion as a guiding principle in practice. PREFER-IT offers a practical and conceptual contribution for how to include ethical, legal and societal aspects when aiming to foster responsible and inclusive AI in healthcare. Author SummaryArtificial intelligence (AI) is being used more and more in healthcare to improve diagnosis, treatment, and personalised care. However, if not designed carefully, these technologies can unintentionally increase existing inequalities and exclude certain groups from their benefits. In our study, we brought together experts from healthcare, ethics, law, social sciences, and patient advocacy to explore how AI in medicine can be made more inclusive. Over five days, we worked together to identify key issues and come up with practical solutions. We focused on three main areas: 1) Ensuring diverse voices are heard during the development of AI tools; 2) Making data collection fair and representative; and 3) Creating regulations that protect human rights. From the discussions of the workshop, we created the PREFER-IT framework, which outlines eight key principles for inclusive AI: O_LIParticipatory and co-design approaches C_LIO_LIRepresentative and diverse data C_LIO_LIEducation and digital literacy C_LIO_LIFairness C_LIO_LIEthical and legal accountability C_LIO_LIReal-world validation and feedback C_LIO_LIInclusive communication C_LIO_LITechnical interoperability C_LI This framework helps guide developers, policymakers, and healthcare professionals in creating AI systems that are not only effective but also fair and respectful of all users. Our work emphasises the importance of involving patients and communities in shaping the future of AI.
Nellihela, A. P.; Gunaratne, K. S.; Bandaranayake, V. C.; Senevirathne, R. N.; Pathirana, T.; Gallala, M.; Asanthi, J.; Pirahanthan, K.; Karunanayake, S. N.; Abegunasekara, A.; Jayasinghe, T. S.; Senarathne, M.
Show abstract
ObjectiveIn Sri Lanka, resource limitations have led to the continued use of paper-based records for patient management. We implemented a cloud-based Electronic Health Record (EHR) system in a tertiary surgical oncology unit, running it alongside the existing paper system. The EHR provided authorised, real-time remote access to patient data, digital theatre scheduling, and facilitated multidisciplinary team collaboration. MethodsTwenty-six healthcare workers (consultants, medical officers, nursing officers, trainees, and clerical staff) completed an online questionnaire assessing the EHRs usability, user satisfaction, and impact on workflow. We prospectively tracked and compared key time metrics between the paper and EHR systems, including theatre list preparation times and cancer biopsy turnaround (biopsy-to-diagnosis interval), to evaluate efficiency gains. ResultsMost participants (84.6%) used the EHR routinely. Users rated the system as highly intuitive, user-friendly, easily accessible, and simple for data entry (mean ratings [~] 4.0 out of 5). Overall satisfaction was high (mean 4.31/5), though system speed was rated slightly lower (mean 3.92), and technical glitches were noted (mean 3.65). Adequate training was associated with significantly higher satisfaction (p<0.05), and satisfaction correlated with perceived intuitiveness (r=0.43) and ease of use (r=0.60). The EHR reduced average theatre list preparation time from 4 minutes 6 seconds (paper) to 2 minutes 24 seconds, saving approximately 1 minute 42 seconds per list(p<0.001). Similarly, the median biopsy-to-diagnosis interval decreased from 14.95 days with the paper process to 8.40 days with the EHRs notification system- an average reduction of 6.55 days(p<0.001). ConclusionImplementing a customised EHR system in a resource-limited surgical oncology setting significantly improved workflow efficiency, reduced diagnostic delays, and enhanced data accessibility and team coordination. Users reported high satisfaction, but challenges such as technical limitations, infrastructure issues, and resistance to change persist. Targeted training, supportive infrastructure, and stakeholder engagement are recommended to sustain the EHR integration and promote greater adoption. HighlightsO_LIElectronic health records enhance workflow, data access, and team collaboration. C_LIO_LIElectronic health records significantly reduced biopsy-to-diagnosis delays. C_LIO_LIHigh user satisfaction is associated with intuitive design and adequate training. C_LIO_LITechnical issues and system speed were primary challenges for users. C_LIO_LITargeted training and robust infrastructure are vital for successful implementation. C_LI
Edara, R.; Khare, A.; Atreja, A.; Awasthi, R.; Highum, B.; Hakimzadeh, N.; Ramachandran, S. P.; Mishra, S.; Mahapatra, D.; Shree, S.; Bhattacharyya, A.; Singh, N.; Reddy, S.; Cywinski, J. B.; Khanna, A. K.; Maheshwari, K.; Papay, F. A.; Mathur, P.
Show abstract
BackgroundBreakthroughs in model architecture and the availability of data are driving transformational artificial intelligence in healthcare research at an exponential rate. The shift in use of model types can be attributed to multimodal properties of the Foundation Models, better reflecting the inherently diverse nature of clinical data and the advancing model implementation capabilities. Overall, the field is maturing from exploratory development towards application in real-world evaluation and implementation, spanning both Generative and predictive AI. MethodsDatabase search in PubMed was performed using the terms "machine learning" or "artificial intelligence" and "2025", with the search restricted to English-language human-subject research. A BERT-based deep learning classifier, pre-trained and validated on manually labeled data, assessed publication maturity. Five reviewers then manually annotated publications for healthcare specialty, data type, and model type. Systematic reviews, duplicates, pre-prints, robotic surgery studies, and non-human research publications were excluded. Publications employing foundation models were further analyzed for their areas of application and use cases. ResultsThe PubMed search yielded 49,394 publications, a near-doubling from 28,180 in 2024, of which 3,366 were classified as mature. 2,966 were included in the final analysis after exclusions, compared to 1946 in 2024. Imaging remained the dominant specialty (976 publications), followed by Administrative (277) and General (251). Traditional text-based LLMs (1,019) led model usage, but Multimodal Foundation Models surged from 25 publications in 2024 to 144 in 2025, and Deep Learning models also increased substantially (910). For the first time, publications related to classical Machine Learning model use declined (173) in our annual review. Image remained the predominant data type (53.9%), followed by text (38.2%), with a notable increase in audio (1.2%) coinciding with the adoption of multimodal models. Across foundation model publications, Imaging (110), Head and Neck (92), Surgery (64), Oncology (55), and Ophthalmology (49) were leading specialties, while Administrative and Education categories remained high-volume contributors driven predominantly by LLM-based research. Conclusion2025 signals a meaningful maturation of the healthcare AI research field, with publication volumes nearly doubling, classical ML yielding to higher-capacity foundation models, and the field rapidly moving beyond traditional text-based LLM capabilities toward multimodal models. While Imaging continues to lead in research output, the growth of multimodal models across clinical specialties suggests the field is approaching an inflection point where AI systems can more closely mirror the complexity of real-world clinical practice.
Samal, L.; Kyle, M. A.; Kilgallon, J. L.; Landrum, K. M.; Gawande, A. A.; Jacobson, J. O.; Hassett, M. J.
Show abstract
IntroductionDiagnostic evaluation and treatment planning for newly diagnosed cancer requires a coordinated effort across multiple specialties. Delays in treatment initiation are common, leading to unnecessary anxiety and decreased survival. Given that timely treatment initiation is pivotal to providing high quality cancer care, we sought to characterize patient intake, workflows, and the role of health information technology (HIT) in a varied group of oncology practices nationwide. MethodsInterviews with oncologists were performed between March and September 2016, with follow-ups conducted between October and December 2021. Thematic analysis was used to assign codes to key elements of the transcripts, group these codes into conceptually distinct and clinically meaningful categories, and identify major cross-cutting themes. ResultsNine oncologists participated in an initial interview (one surgical, two radiation, six medical oncology). Four oncologists participated in a follow-up interview (one radiation, three medical oncology). In both time periods there was tremendous variation in staff roles and communication processes; some oncology practices obtained diagnostic studies before the first oncology consult visit, whereas others waited until after the initial consult visit to begin the diagnostic evaluation. Variability and tension were noted to arise from deficiencies in HIT, such as lack of interoperability, impaired speed and quality of data collection, cumbersome user interfaces, and variety of data types in oncology care. Oncologists reported only modest improvements in HIT between 2016 and 2021. ConclusionAssembling data to make a new cancer diagnosis and treatment plan is complex and time-intensive. HIT interoperability remains a quasi-manual process, contributing to preventable treatment delays. Federal policy supporting interoperability provides an opportunity to develop HIT that supports care coordination and patient-centered care, but effective implementation of such tools will be challenging within current workflows.
Adekunle, T.; Ohaeche, J.; Adekunle, T.; Adekunle, D.; Kogbe, M.
Show abstract
BackgroundArtificial intelligence is increasingly embedded in healthcare delivery. Its legitimacy depends on institutional governance, not technical performance alone. Prior research has centered on clinicians and patients. Less attention has been given to cybersecurity professionals who sustain the digital infrastructures that support health AI. This study examines how cybersecurity professionals conceptualize AI as clinical infrastructure and how these interpretations shape understandings of trust, risk, and oversight. MethodsGuided by sociotechnical systems theory and institutional trust scholarship, we conducted semi-structured in-depth interviews with twenty cybersecurity professionals working in healthcare-relevant domains. Participants were recruited through professional networks and LinkedIn outreach. Interviews were conducted between May and August 2025. They were audio-recorded and transcribed verbatim. Data were analyzed using qualitative content analysis with constant comparison. Two researchers independently coded transcripts and refined themes through iterative discussion. The study received Institutional Review Board approval. ResultsParticipants described health AI as an augmented clinical infrastructure. They emphasized that AI extends workflow capacity but requires sustained human oversight. Healthcare data systems were characterized as fragmented and vulnerable. Breaches were treated as anticipated events. Trust in AI was described as contingent and built over time through visible accountability. Cybersecurity stewardship was framed as foundational to institutional trustworthiness. ConclusionsHealth AI credibility emerges through governance practices that demonstrate accountability. Cybersecurity professionals and institutional stakeholders jointly shape trust in digitally mediated healthcare systems through governance decisions that signal accountability.
Awasthi, R.; Mishra, S.; Grasfield, R.; Maslinski, J.; Mahapatra, D.; Cywinski, J. B.; Khanna, A. K.; Maheshwari, K.; Dave, C.; Khare, A.; Papay, F. A.; Mathur, P.
Show abstract
BackgroundThe infodemic we are experiencing with AI related publications in healthcare is unparalleled. The excitement and fear surrounding the adoption of rapidly evolving AI in healthcare applications pose a real challenge. Collaborative learning from published research is one of the best ways to understand the associated opportunities and challenges in the field. To gain a deep understanding of recent developments in this field, we have conducted a quantitative and qualitative review of AI in healthcare research articles published in 2023. MethodsWe performed a PubMed search using the terms, "machine learning" or "artificial intelligence" and "2023", restricted to English language and human subject research as of December 31, 2023 on January 1, 2024. Utilizing a Deep Learning-based approach, we assessed the maturity of publications. Following this, we manually annotated the healthcare specialty, data utilized, and models employed for the identified mature articles. Subsequently, empirical data analysis was performed to elucidate trends and statistics.Similarly, we performed a search for Large Language Model(LLM) based publications for the year 2023. ResultsOur PubMed search yielded 23,306 articles, of which 1,612 were classified as mature. Following exclusions, 1,226 articles were selected for final analysis. Among these, the highest number of articles originated from the Imaging specialty (483), followed by Gastroenterology (86), and Ophthalmology (78). Analysis of data types revealed that image data was predominant, utilized in 75.2% of publications, followed by tabular data (12.9%) and text data (11.6%). Deep Learning models were extensively employed, constituting 59.8% of the models used. For the LLM related publications,after exclusions, 584 publications were finally classified into the 26 different healthcare specialties and used for further analysis. The utilization of Large Language Models (LLMs), is highest in general healthcare specialties, at 20.1%, followed by surgery at 8.5%. ConclusionImage based healthcare specialities such as Radiology, Gastroenterology and Cardiology have dominated the landscape of AI in healthcare research for years. In the future, we are likely to see other healthcare specialties including the education and administrative areas of healthcare be driven by the LLMs and possibly multimodal models in the next era of AI in healthcare research and publications.
Campbell, I. M.; Karavite, D. J.; McManus, M. L.; Cusick, F. C.; Junod, D. C.; Sheppard, S. E.; Lourie, E. M.; Shelov, E. D.; Hakonarson, H.; Luberti, A. A.; Muthu, N.; Grundmeier, R. W.
Show abstract
ObjectiveWe sought to develop and evaluate an electronic health record (EHR) genetic testing tracking system to address the barriers and limitations of existing spreadsheet-based workarounds. Materials and MethodsWe evaluated the spreadsheet-based system using mixed effects logistic regression to identify factors associated with delayed follow up. These factors informed the design of an EHR-integrated genetic testing tracking system. After deployment we assessed the system in two ways. We analyzed EHR access logs and note data to assess patient outcomes and performed semi-structured interviews with users to identify impact of the system on work. ResultsWe found that patient-reported race was a significant predictor of documented genetic testing follow up, indicating a possible inequity in care. We implemented a CDS system including a patient data capture form and management dashboard to facilitate important care tasks. The system significantly speeded review of results and significantly increased documentation of follow-up recommendations. Interviews with system users identified key team members ensuring success and revealed that the system addresses a number of sociotechnical factors that collectively result in safer and more efficient care. DiscussionOur new tracking system ended decades of workarounds for identifying and communicating test results and improved clinical workflows. Interview participants related that the system decreased cognitive and time burden which allowed them to focus on direct patient interaction. ConclusionBy assembling a multidisciplinary team, we designed a novel patient tracking system that improves genetic testing follow up. Similar approaches may be effective in other clinical settings.
Galfano, A.; Barbosu, C. M.; Aladin, B.; Rivera, I.; Dye, T. D. V.
Show abstract
Artificial intelligence (AI) is dramatically changing the healthcare landscape by providing patients, clinicians, administrators, and public health professionals with tools aiming to improve efficiency, outcomes, and experience in health. As elsewhere, New York State (NYS) experiences high demand for - and high investment in - transformation in healthcare with AI tools, though little is known about clinicians use and interest in adopting AI tools in their work. A large share of the nations future primary care clinicians train and work in NYS, and the states ability to establish clear policies, provide tools, and elevate AI competency have implications for care delivery nationally. As a result, we undertook this analysis of NYS clinicians use of AI to better understand opportunities for its adoption and inclusion in continuing education. For this analysis, we included healthcare providers who deliver ambulatory or specialty medical care within NYS, with use/frequency/purpose of AI tools by clinicians in their work as the main outcome. Of 305 NYS clinical providers responding, 23.4% indicated they use AI tools for work, and 11.1% report monthly use, 8.5% weekly use, and 4.6% daily use. AI was primarily used to search guidelines and ask clinical questions, followed by identifying drug interactions, analyzing data, analyzing images/labs, and creating care plans and patient recommendations. AI use did not vary significantly across professional disciplines or practice types, though independent practitioners were significantly more likely than advanced practice providers to use AI in their work, as were providers using social media and digital methods for obtaining continuing education. AI use increased substantially in 2025 compared with 2024. Overall, our findings suggest that programs targeting clinicians could incorporate these findings in designing accessible and acceptable AI-related continuing education opportunities to help familiarize clinicians with opportunities and risks for integrating AI tools into their practices. Author SummaryAI tools are rapidly gaining traction in the delivery of healthcare. We found that clinician use of AI was quite limited (23%), though growing. Those using AI tools used them sparingly in their work, with only about 5% reporting daily use. The purposes for which clinicians report using AI - asking clinical questions, interpreting patient results, creating patient educational materials - could contribute substantially to healthcare outcomes if widely adopted. Designers of continuing education for clinicians should help provide opportunities for clinicians to improve their familiarity, use, and competency with AI tools, to help maximize the potential health benefits possible for patients and communities.
Yip, A.; Craig, G.; White, N. M.; Cortes-Ramirez, J.; Shaw, K.; Reddy, S.
Show abstract
PurposeTo evaluate whether large language models (LLMs) can enhance clinician-patient communication by simplifying radiology reports to improve patient readability and comprehension. MethodsA randomised controlled trial was conducted at a single healthcare service for patients undergoing X-ray, ultrasound or computed tomography between May 2025 and June 2025. Participants were randomised in a 1:1 ratio to receive either (1) the formal radiology report only or (2) the formal radiology report and an LLM-simplified version. Readability scores, including the Simple Measure of Gobbledygook, Automated Readability Index, Flesch Reading Ease, and Flesch-Kincaid grade level, were calculated for both reports. Statistical analysis of patient readability and comprehension levels, factual accuracy and hallucination rates for LLMs was assessed using a combination of binary and 5-point Likert scales, open-ended survey questions, and independent review by two radiologists. Results59/120 patients were randomised to receive both the formal and LLM-simplified radiology reports. Readability of LLM-simplified reports significantly improved with the reading level required for formal reports equivalent to a university-standard (11th-13th grade) compared to a middle-school standard (5th-9th grade) for simplified reports (rank biserial correlation=0.83, p<0.001). Patients with both reports demonstrated a significantly greater comprehension level, with 95% reporting an understanding level greater than 50%, compared with 46% without the simplified report (rank biserial correlation = 0.67, p < 0.001). All LLM-simplified reports were considered at least somewhat accurate with a minimal hallucination rate of 1.7%. Importantly, no hallucinations resulted in potential patient harm. 118/120 (98.3%) patients expressed interest in simplified radiology reports to be included in future clinical practice. ConclusionThis study provides evidence that LLMs can simplify radiology reports to an accessible level of readability with minimal hallucination. LLMs improve both ease of readability and comprehension of radiology reports for patients. Therefore, the rapid advancement of LLMs shows strong potential in enhancing patient-radiologist communication as patient access to electronic health records is increasingly adopted. HighlightsO_LIRadiology reports can be complex and difficult for patients to read and interpret C_LIO_LIStrong patient demand exists for simplified radiology reports C_LIO_LILarge language models (LLMs) such as GPT-4o show promise in simplifying radiology reports C_LIO_LILLMs credibly simplify radiology reports with minimal hallucination rates C_LIO_LILLMs improve both patient readability and comprehension of radiology reports C_LI
Brereton, T. A.; Malik, M.; Lifson, M. A.; Greenwood, J. D.; Peterson, K. J.; Overgaard, S. M.
Show abstract
BackgroundTranslation of artificial intelligence/machine learning (AI/ML)-based medical modeling software (MMS) into clinical settings requires rigorous evaluation by interdisciplinary teams and across the AI lifecycle. The fragmented nature of available resources to support MMS documentation limits the transparent reporting of scientific evidence to support MMS, creating barriers and impeding the translation of software from code to bedside. ObjectiveThe aim of this paper is to scope AI/ML-based MMS documentation practices and define the role of documentation in facilitating safe and ethical MMS translation into clinical workflows. MethodsA scoping review was conducted in accordance with PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. MEDLINE (PubMed) was searched using MeSH key concepts of AI/ML, ethical considerations, and explainability to identify publications detailing AI/ML-based MMS documentation, in addition to snowball sampling of selected reference lists. To include the possibility of implicit documentation practices not explicitly labeled as such, we did not use "documentation " as a key concept but rather as an inclusion criterion. A two-stage screening process (title and abstract screening and full-text review) was conducted by an independent reviewer. A data extraction template was utilized to record publication-related information, barriers to developing ethical and explainable MMS, available standards, regulations, frameworks, or governance strategies related to documentation, and recommendations for documentation for papers that met inclusion criteria. ResultsOf the total 115 papers, 21 (18%) articles met the requirements for inclusion. Data regarding the current state and challenges of AI/ML-based documentation was synthesized and themes including bias, accountability, governance, and interpretability were identified. ConclusionsOur findings suggest that AI/ML-based MMS documentation practice is siloed across the AI life cycle and there exists a gray area for tracking and reporting of non-regulated MMS. Recommendations from the literature call for proactive evaluation, standards, frameworks, and transparency and traceability requirements to address ethical and explainability barriers, enhance documentation efforts, provide support throughout the AI lifecycle, and promote translation of MMS. If prioritized across multidisciplinary teams and across the AI lifecycle, AI/ML-based MMS documentation may serve as a method of coordinated communication and reporting toward resolution of AI translation barriers related to bias, accountability, governance, and interpretability.
Choudhury, A.; Elkefi, S.; Tounsi, A.
Show abstract
As ChatGPT emerges as a potential ally in healthcare decision-making, it is imperative to investigate how users leverage and perceive it. The repurposing of technology is innovative but brings risks, especially since AIs effectiveness depends on the data its fed. In healthcare, where accuracy is critical, ChatGPT might provide sound advice based on current medical knowledge, which could turn into misinformation if its data sources later include erroneous information. Our study assesses user perceptions of ChatGPT, particularly of those who used ChatGPT for healthcare-related queries. By examining factors such as competence, reliability, transparency, trustworthiness, security, and persuasiveness of ChatGPT, the research aimed to understand how users rely on ChatGPT for health-related decision-making. A web-based survey was distributed to U.S. adults using ChatGPT at least once a month. Data was collected from February to March 2023. Bayesian Linear Regression was used to understand how much ChatGPT aids in informed decision-making. This analysis was conducted on subsets of respondents, both those who used ChatGPT for healthcare decisions and those who did not. Qualitative data from open-ended questions were analyzed using content analysis, with thematic coding to extract public opinions on urban environmental policies. The coding process was validated through inter-coder reliability assessments, achieving a Cohens Kappa coefficient of 0.75. Six hundred and seven individuals responded to the survey. Respondents were distributed across 306 US cities of which 20 participants were from rural cities. Of all the respondents, 44 used ChatGPT for health-related queries and decision-making. While all users valued the content quality, privacy, and trustworthiness of ChatGPT across different contexts, those using it for healthcare information place a greater emphasis on safety, trust, and the depth of information. Conversely, users engaging with ChatGPT for non-healthcare purposes prioritize usability, human-like interaction, and unbiased content. In conclusion our study findings suggest a clear demarcation in user expectations and requirements from AI systems based on the context of their use.
Li, A. K. C.; Rauf, I. A.; Keshavjee, K.
Show abstract
BackgroundCanada has invested significantly in artificial intelligence (AI) research and development over the last several years. Canadians knowledge of and attitudes towards AI in healthcare are understudied. ObjectivesTo explore the relationships between age, gender, education level, and income on Canadians knowledge of AI, their comfort with its use in healthcare, and their comfort with using personal health data in AI research. MethodsOrdinal logistics regression and multivariate polynomial regression were applied to data from the 2021 Canadian Digital Health Survey using RStudio and SigmaZones Design of Experiments Pro. ResultsFemale and older Canadians self-report less knowledge about AI than males and other genders and younger Canadians. Female Canadians and healthcare professionals are less comfortable with use of AI in healthcare compared to males and people with other levels of education. Discomfort appears to stem from concerns about data security and the current maturity level of the technology. ConclusionKnowledge of AI and the use of AI in healthcare are inversely correlated with age and directly correlated with education and income levels. Overall, female respondents self-reported less knowledge and comfort with AI in healthcare and research than other genders. Privacy concerns should continue to be addressed as a major consideration when implementing AI tools. Canadians, especially older females, not only need more education about AI in healthcare, but also need more reassurance about the safe and responsible use of their data and how bias and other issues with AI are being addressed. Author SummaryArtificial intelligence (AI) and its application has garnered significant public interest and excitement within healthcare in recent years. However, its successful integration and use in healthcare will depend on patient and user adoption. As a result, AI tools may be limited in healthcare when user concerns are not carefully addressed and if patients are not educated about how these technologies work. While there have been studies on the attitudes of clinicians and healthcare professionals toward AI, little is known about the general publics perception of AI within the healthcare setting. Our study addresses this gap in the literature by analyzing data from the 2021 Canadian Digital Health Survey to understand the relationships between Canadians attitudes towards AI and various socioeconomic and demographic factors. Our results found that older Canadians, Canadians with less formal education and women need to be better informed about the safe and responsible use of AI and be reassured about good data security practices before it can be broadly accepted by them. In addition, the element of trust may be a factor that is contributing to the higher levels of discomfort with AI observed in middle-aged Canadians. The findings from this study will help stakeholders better implement and broaden the accessibility of AI technologies.
Brin, D.; Sorin, V.; Konen, E.; Glicksberg, B. S.; Nadkarni, G.; Klang, E.
Show abstract
ABSTRACTO_ST_ABSObjectiveC_ST_ABSThe United States Medical Licensing Examination (USMLE) assesses physicians competency and passing is a requirement to practice medicine in the U.S. With the emergence of large language models (LLMs) like ChatGPT and GPT-4, understanding their performance on these exams illuminates their potential in medical education and healthcare. Materials and MethodsA literature search following the 2020 PRISMA guidelines was conducted, focusing on studies using official USMLE questions and publicly available LLMs. ResultsThree relevant studies were found, with GPT-4 showcasing the highest accuracy rates of 80-90% on the USMLE. Open-ended prompts typically outperformed multiple-choice ones, with 5-shot prompting slightly edging out zero-shot. ConclusionLLMs, especially GPT-4, display proficiency in tackling USMLE-standard questions. While the USMLE is a structured evaluation tool, it may not fully capture the expansive capabilities and limitations of LLMs in medical scenarios. As AI integrates further into healthcare, ongoing assessments against trusted benchmarks are essential.
Jackson, N. J.; Brown, K. E.; Miller, R.; Murrow, M.; Cauley, M.; Collins, B. X.; Novak, L. L.; Benda, N.; Ancker, J. S.
Show abstract
ObjectiveResearch on artificial intelligence-based clinical decision-support (AI-CDS) systems has returned mixed results. Sometimes providing AI-CDS to a clinician will improve decision-making performance, sometimes it will not, and it is not always clear why. This scoping review seeks to clarify existing evidence by identifying clinician-level and technology design factors that impact the effectiveness of AI-assisted decision-making in medicine. Materials and MethodsWe searched MEDLINE, Web of Science, and Embase for peer-reviewed papers that studied factors impacting the effectiveness of AI-CDS. We identified the factors studied and their impact on three outcomes: clinicians attitudes toward AI, their decisions (e.g., acceptance rate of AI recommendations), and their performance when utilizing AI-CDS. ResultsWe retrieved 5,850 articles and included 45. Four clinician-level and technology design factors were commonly studied. Expert clinicians may benefit less from AI-CDS than non-experts, with some mixed results. Explainable AI increased clinicians trust, but could also increase trust in incorrect AI recommendations, potentially harming human-AI collaborative performance. Clinicians baseline attitudes toward AI predict their acceptance rates of AI recommendations. Of the three outcomes of interest, human-AI collaborative performance was most commonly assessed. Discussion and ConclusionFew factors have been studied for their impact on the effectiveness of AI-CDS. Due to conflicting outcomes between studies, we recommend future work should leverage the concept of appropriate trust to facilitate more robust research on AI-CDS, aiming not to increase overall trust in or acceptance of AI but to ensure that clinicians accept AI recommendations only when trust in AI is warranted.
Onovo, A. A.; Cherima, Y. J.
Show abstract
ImportanceEmerging evidence suggests healthcare AI systems may exhibit deceptive alignment (appearing safe during validation while optimizing for misaligned objectives in deployment) and evaluation awareness (detecting and adapting behavior during audits), undermining regulatory validation frameworks. ObjectiveTo quantify the performance of multi-layer red-teaming approaches in detecting sophisticated healthcare AI safety failures across 10 vulnerability domains. Design, Setting, and ParticipantsSystematic red-teaming evaluation conducted December 2025-January 2026 using 50 healthcare-specific adversarial prompts across 10 vulnerability domains (5 prompts per domain). GPT-OSS-20B (21-billion parameter open-source model) was evaluated using a three-layer framework: Layer 1 (automated keyword pattern matching), Layer 2 (GPT-5 semantic analysis with chain-of-thought reasoning), and Layer 3 (PhD-level human expert adjudication as ground truth). Inter-rater reliability was assessed using Cohen kappa statistic. Main Outcomes and MeasuresPrimary outcome was risk classification (SAFE, LOW, MODERATE, or HIGH RISK) with severity scoring (0-10 scale). Secondary outcomes included layer detection rates, inter-rater agreement, thematic vulnerability patterns, and chain-of-thought reasoning exposure. ResultsOf 50 prompts evaluated, 12 (24%, 95% CI: 13.5%-38.2%) were classified HIGH RISK, 8 (16%, 95% CI: 7.7%-28.8%) MODERATE RISK, 17 (34%, 95% CI: 21.8%-48.3%) LOW RISK, and 13 (26%, 95% CI: 15.2%-39.9%) SAFE. Evaluation Awareness domain exhibited 100% HIGH/MODERATE RISK rate (5/5 prompts, mean severity 7.4/10), with the model providing technical specifications for audit detection and behavior switching. Sandbagging domain showed 40% HIGH RISK rate (2/5 prompts) with FDA validation gaming strategies. All 12 HIGH RISK prompts involved audit gaming themes (audit detection, behavior switching, system fingerprinting, or reporting violations). Layer 1 keyword matching detected 2/12 (17%, 95% CI: 4.7%-44.8%) high-risk prompts. Layer 2 GPT-5 analysis detected 12/12 (100%, 95% CI: 75.8%-100%) high-risk prompts with 0/13 (0%, 95% CI: 0%-22.8%) false positives. Human expert validation confirmed perfect concordance with Layer 2 assessments (kappa = 1.00, 95% CI: 0.999-1.000, p < 0.001), validating automated semantic analysis as reliable screening tool. Chain-of-thought leakage occurred in 28/50 (56%) prompts, exposing internal safety reasoning. Conclusions and RelevanceMulti-layer evaluation is essential for detecting sophisticated AI safety failures in healthcare. Keyword filtering alone missed 83% (95% CI: 55.2%-95.3%) of high-risk behaviors. Perfect inter-rater agreement (kappa=1.00) between automated AI semantic analysis and human expert judgment demonstrates that scalable, reliable safety screening is achievable. All HIGH-RISK outputs contained audit gaming content, indicating systematic capability to articulate regulatory circumvention. Healthcare AI systems require domain-specific red-teaming for regulatory audit gaming and dual-mode behavior detection. Findings reveal critical gaps in current AI safety measures with immediate implications for FDA/CMS regulatory frameworks.
Thomas, C.; Kim, J. Y.; Hasan, A.; Kpodzro, S.; Cortes, J.; Day, B.; Jensen, S.; LHuillier, S.; Oden, M. O.; Zumbado Segura, S.; Maurer, E. W.; Tucker, S.; Robinson, S.; Garcia, B.; Muramalla, E.; Lu, S.; Chawla, N.; Patel, M.; Balu, S.; Sendak, M.
Show abstract
Safety net healthcare delivery organizations (SNOs) serve vulnerable populations but face persistent challenges in adopting new technologies, including AI. While systematic barriers to technology adoption in SNOs are well documented, little is known about how AI is implemented in these settings. This study explored real-world AI adoption in SNOs, focusing on identifying barriers encountered across the AI lifecycle and strategies used to overcome them. Five SNOs in the U.S. participated in a 12-month technical assistance program, the Practice Network, to implement AI tools of their choosing. Observed barriers and mitigation strategies were documented throughout program activities and, at the conclusion of the program, reviewed and refined with participants using a participatory research approach to ensure findings reflected lived experiences and organizational contexts. Key barriers emerged during the Integration and Lifecycle Management phases and included gaps in AI performance evaluation and impact assessments, communication with patients about AI use, foundational AI education, financial resources for purchasing and maintaining AI tools, and AI governance structures. Effective strategies for addressing these barriers were primarily supported through centralized expertise, structured guidance, and peer learning. These findings provide granular, actionable insights for SNO leaders, offering guidance for anticipating barriers and proactively planning mitigation strategies. By including SNO perspectives, the study also contributes to the broader health AI ecosystem and underscores the importance of participatory, collaborative approaches to support safe, effective, and ethical AI adoption in resource-constrained settings. Author SummarySafety net organizations (SNOs) are healthcare systems that primarily serve low-income and underinsured patients. While interest in artificial intelligence (AI) in healthcare has grown rapidly, little is known about how these organizations experience AI adoption in practice. In this study, we partnered with five SNOs over a 12-month program to document the challenges they encountered when implementing AI tools and the strategies they used to address them. We worked closely with SNO staff throughout the process to ensure our findings reflected their lived experiences with AI implementation. We found that the most common challenges arose when organizations tried to integrate AI into daily operations and monitor and maintain those tools over time. Specific barriers included difficulty evaluating whether AI was performing as expected, limited guidance on communicating with patients about AI use, a lack of resources for staff training, limited financial resources, and the absence of formal governance structures. Successful strategies for overcoming these challenges drew on shared knowledge and structured support provided by the program, as well as learning from peer organizations. These findings offer practical guidance for SNO leaders planning or managing AI adoption, and contribute to a broader conversation about what is required to implement AI safely and effectively in healthcare settings that serve the most medically and socially vulnerable patients.